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Adaptive Monte Carlo methods are very efficient techniques de- 
signed to tune simulation estimators on-line. In this work, we present 
an alternative to stochastic approximation to tune the optimal change 
of measure in the context of importance sampling for normal ran- 
dom vectors. Unlike stochastic approximation, which requires very 
fine tuning in practice, we propose to use sample average approxi- 
mation and deterministic optimization techniques to devise a robust 
and fully automatic variance reduction methodology. The same sam- 
ples are used in the sample optimization of the importance sampling 
parameter and in the Monte Carlo computation of the expectation 
of interest with the optimal measure computed in the previous step. 
We prove that this highly dependent Monte Carlo estimator is con- 
vergent and satisfies a central limit theorem with the optimal limit- 
ing variance. Numerical experiments confirm the performance of this 
estimator: in comparison with the crude Monte Carlo method, the 
computation time needed to achieve a given precision is divided by a 
factor between 3 and 15. 

Introduction. We are interested in the computation of E(/(G)), where 
G = {G^ , . . . jC^) is a d-dimensional standard normal random vector and 
/ : — > M is a measurable function such that f{G) is integrable. This prob- 
lem is particularly important in mathematical finance, where the calculation 
of the price and hedging ratios of European options in multidimensional 
Black-Scholes models amounts to the computation of E(/(G)) for a well- 
chosen function /. The same is true when the underlying assets follow more 
complex dynamics given by stochastic differential equations, which can be 
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discretized using the Euler scheme, for instance. We assume that the random 
variable f{G) is nonzero and shghtly more than square- integrable: 

(0.1) P(/(G)/0)>0, 

(0.2) V^gM'^ E(/2(G)e-^-^) < +00. 

By Holder's inequality, (0.2) holds provided E(|/|^+^(G)) < +oo for some 
£ > 0. For any measurable function h:W^ — > M either nonnegative or such 
that E|/i(G)| < +00, one has 

(0.3) WeR'^ E(/i(G)) =E(/i(G + 0)e-^-^-l''l'/2). 

Applying this equality to h{x) = f{x) and to h{x) = /^(2;)e~^'^+l^l^/^, one 
obtains that the expectation and the variance of the random variable f{G + 
Qy-e-G-\e\^ /2 gq^g^i E(/(G)) and vf {6) -^'^ {f {G)) , respectively, where 

t;/(0)1^'E(/2(G)e-^-^+l^l'/2). 

As a consequence, if (Gj)j>i denotes a sequence of i.i.d. d-dimensional stan- 
dard normal random vectors, for any importance sampling parameter 6 G W^, 

1 " 

Mn(^,/)'='-E/(G^ + e)e-'-^'-l'l'/' 
n 

1=1 

is an unbiased and convergent estimator of E(/(G)). Since n Var(M„(0, /)) = 
{6) — E^(/(G)), to improve the accuracy of the estimation for a fixed num- 
ber n of random samples, one should choose 9 minimizing v^{6). The first 
section of this paper addresses this minimization problem. First, we check 
that is a strongly convex function going to infinity at infinity, which en- 
sures the existence of a unique value Ol such that {Ol) = mig^T^dV^ {9). 
Of course, when E(/(G)) is unknown, in general, so is the function . 
Therefore, direct optimization of this function is not implement able. Using 
a large deviations argument, Glasserman, Heidelberger and Shahabuddin 
[8] suggest the use of maximizing log|/(0)| — where, by convention, 
log(O) = — cxD. However, this choice is not optimal and the numerical search 

for a local maximum of log |/(0)| — is only possible if the function / pos- 
sesses some regularity. Under (0.2), the function is infinitely continuously 
differentiable and such that 

(0.4) Vev^{e) = E((0 - G)/2(G)e-^-^+l^l'/2). 

At this stage, we can understand the appeal of performing the change of 
measure (0.3) to transform E(/2(G + 6')e-2^-^-l^l') into the above expression 
of : no smoothness assumption on the function / is required in order to 
differentiate within the expectation. 
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Arouna [1, 2] takes advantage of the characterization of the optimal pa- 
rameter Ol as the unique solution of the equation E((0 — G)/^(G)e~^''^"'"l^l^/^) = 
0, in order to approximate it by a Robbins-Monro procedure. The stan- 
dard Robbins-Monro algorithm explodes, but it can be stabilized using ran- 
dom truncation techniques; see, for instance, [3, 4] or [12]. According to 
[1], the same random drawings Gi may be used to simultaneously estimate 
the optimal parameter 9( and the expectation of interest E(/(G')). More- 
over, both estimators are strongly consistent and the estimator of E(/(G)) 
is asymptotically normal with an asymptotic variance equal to the opti- 
mal one, {Oi) — 'E?{f{G)). Asymptotic normality of the estimator of Ol is 
discussed in [11]. 

Tuning the increasing sequence of compact subsets used in randomly trun- 
cated procedures is not easy. In [13], Lemaire and Pages note that using (0.3) 
in (0.4) leads to Vev^{e) = el^l'E((20 - G)f{G - 9)) and propose to use the 
characterization of Oi, as the unique solution of E((2^ — G)f'^{G — 9)) = 
in order to approximate it by a Robbins-Monro procedure. As soon as the 
function / satisfies some exponential growth assumptions at infinity, the al- 
gorithm they propose is stable without resorting to random truncation tech- 
niques. Starting from the present Gaussian framework, Lemaire and Pages 
[13] extend this construction of non-exploding Robbins-Monro algorithms 
to a large class of families of multidimensional probability distributions and 
even to diffusion process distributions. 

In the present paper, we propose and study an alternative approach, 
one which does not require the delicate tuning of the gain sequence which 
is still necessary to ensure the stability of Robbins-Monro procedures. 
When f{Gi) ^ for some index i G {1, . . . ,n} [by (0.1), a.s. this condition 

is satisfied for n large enough], the Monte Carlo approximation vl^{9) =^ 
^ Ya=i /^(Gi)e"^''^'"'"l^l of the function is also strongly convex and go- 
ing to infinity at infinity. This ensures the existence of a unique parameter 
Ol^ such that v(^{6f^) = infggjgd t)^(0). The function is of class G°° and its 
gradient and Hessian matrices 

Vevi{9)=^-j:{9-G,)f\G,)e-'-^^^\'\''\ 
n 

1=1 

^WniO) = -T.^h + {e- G.,){9 - G.)*)/2(G.)e"^-^»+l^l^/2 
1=1 

are easily computed if the random samples (Gj)i<j<„ are stored in the com- 
puter memory. Therefore, 9(^ can be computed with high precision by a few 
(four or five) steps of Newton's optimization procedure. In fact, we calculate 
0^ as the unique critical point of a modified function defined in Section 
3.1 and such that the Hessian matrix V^u^ is greater than the identity and 
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precised in Section 3.1. In contrast with stochastic approximation proce- 
dures, no tuning is necessary for this optimization algorithm. This is the 
reason for the adjective "robust" in the title of the paper. We propose to 
estimate E(/(G)) by M„(0^,/). In the context of control variate variance 
reduction techniques, Kim and Henderson [9] also propose sample average 
optimization of the control variate parameters as an alternative to stochas- 
tic approximation techniques. However, in their algorithm, the expectation 
of interest is only computed in a second step involving random variables 
independent of the ones used in the optimization step. In our algorithm, 
in order to save computation time, the respective approximations Of^ and 
-^n(^n)/) of the optimal parameters and of the expectation of interest are 
computed using the same set of random samples (G'i)i<j<„. This makes the 
mathematical analysis of the properties of M„(0^,/) more complicated: for 
instance, in general, Mn{Ol,f) is a biased estimator of E(/(G)). So far, 
this idea of using the same samples both for the optimization procedure 
and for the Monte Carlo computation has mainly been investigated in the 
much simpler context of linear control variates; see [7], Section 4.1 and the 
references therein. More precisely, for the computation of the expectation 
E(X) of the random variable X using 6 -Y as a control variate, where Y 
is a centered d-dimensional random vector, the strong law of large numbers 
and the central limit theorem enable us to deduce the asymptotic behaviour 
of the estimator ^J2i'=ii-^i ~ • Yi), where (Xj,li)j>i is an i.i.d. sample 
of the law of {X,Y) with empirical mean = ^J27=i{^i^Yi) ^^'^ 

On = (EtiiYi - YnWi - Ynr)-^j:7=i{Xi " X^Wi - minimiz_es the 
sample average approximation 6 ^ ^ J27=i{^i — & ' Yi — Xn + ■ Yn)"^ of 
Yar{X-e-Y). 

The first section of the paper is devoted to the convergence of el to 
9i : almost sure convergence holds and a central limit theorem can be de- 
rived under the reinforced integrability condition (1.2). Moreover, vl^O^) 
converges a.s. to {Ol). The second section addresses the asymptotic prop- 
erties, as n — > oo, of our estimator Mn{6l, f) of the expectation of in- 
terest E(/(G)). We prove that when / is continuous and such that for 
all M > 0, E(sup|5,|<^,f + 6')|) < +oo, then Mn{9l, f) converges a.s. to 
E(/(G)). In dimension d=l, this continuity assumption may be relaxed: 
the strong consistency of Mn{Ol,f) still holds provided / is the sum of a 
continuous function (as before) and a function of finite variation satisfy- 
ing some natural growth condition. When / satisfies (1.2) and can be de- 
composed as the sum of a locally Holder continuous function with some 
natural control of the growth of the Holder continuity constant and of 
a function satisfying some integrability conditions, then the estima- 
tor Mn{Ol,f) is asymptotically normal with optimal asymptotic variance 
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Moreover, \/n- 



A/i(0,l), which enables us to construct 



confidence intervals for E(/(G)). Again, in dimension (i = 1, the conclusion 
is preserved if one adds a function with finite variation satisfying some nat- 
ural growth condition to the previous decomposition. In the last section, we 
illustrate our theoretical results with numerical experiments which confirm 
the performance of our algorithm. 

1. Convergence of the importance sampling parameters. According to 
our numerical experiments, it may be optimal, in terms of the computation 
time needed to achieve a given precision for the estimation of E(/(G')), 
to search for the best importance sampling parameter in a subspace 
{A^-.-d G W^'} of W'-, where A e W'-^'^' is a matrix with rank d' < d. When 
f{G) corresponds to the payoff of an option written on a d'-dimensional 
Black-Scholes model monitored on a regular time grid, it is sensible to use 
the same parameter for each coordinate G'' corresponding to a time incre- 
ment of a given Brownian coordinate. In Section 3.2, the choice of the cor- 
responding matrix A is made precise for the standard Black-Scholes model 
and the multidimensional Black-Scholes model in the examples of the one- 
dimensional barrier option and the barrier basket option, respectively. For 
this choice, according to our numerical experiments, the variances obtained 
with and without parameter reduction are nearly the same. That is why we 
introduce 



Since ^■^'"^(^9) = {A'd), the properties of the function U'^'"^ may be deduced 
from those of . The case treated in the Introduction corresponds to the 
particular choices d' = d and A = I^- 



Lemma 1.1. Under (0.2), the function is infinitely continuously dif- 
ferentiable with, for all a = {a^ , . . . ,a'^) G N'^ and all = {0^ , . . . ,6'^) £W^, 



Under (0.1), the function is strongly convex and hence is such that 
lim|e|_+oo vf{e) = +00. 

Proof. The function /^(G')e~^''^"'~l^l^/^ is infinitely continuously 
differentiable with 9f2^Q^^-e-G+\e\y2 ^ /2((^)(^i _ Qj-^^-e-G+mV^ _ ^^^^^ 



V 




sup \dQjf'^{G)e 



0-G+|e|2/2 



\e\<M 
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fc=i 

where the right-hand side is integrable by (0.2), Lebesgue's theorem ensures 
that vf is continuously differentiable with jr-vf(9) = E(p(G)(e^ - G^) x 

Q-e-G+\e\ Higher-order differentiabihty properties are obtained by sim- 
ilar arguments and, in particular, g . {Q) =E((l|j=j} -|- {Q^ — G^){6^ — 
G'))/2(G)e-^-G+l''lV2). 

Assumption (0.1) ensures the existence of e > such that P(/^(G) > 
e, \G\ < i) > 0. Since E(/2(G)e-^-«+l^l'/2) > ee-I^IA+l^l'/2p(/2(G) > e, |G| < 
i), one easily deduces that lim|0|^+oo IE(/^(G)e-^-^+l^l''/2) = +oo. As the 
continuous function ^ E(/^(G)e^^''^+l^l^/^) does not vanish, the Hessian 
matrix V^f (6) is uniformly bounded from below by the positive definite 
matrix infggjgd E(/^(G)e~^''^"'"l^l^/^)/rf. This yields the strong convexity of 
the function . □ 

As a consequence, V'^''^ is a strongly convex function going to infinity 
at infinity and there exists a unique "dl'^ G M*^' such that V'I''^{'&1'^) = 
inf^ggd' v^'^^i-d). 

Let (Gj)j>i be a sequence of d-dimensional independent standard normal 
random variables. For n large enough, f{Gi) ^ for some index i G {1, . . . , n} 
and the approximation vl^^{d) = i J2i=i /2(Gi)e-^'^-^^+l^'^l'/2 of u-^'^(t?) is 
also strongly convex and such that lim|^|^^oQ u^'"^(i?) = +oo. Hence, there 
exists a unique iJ^'^ e R'^' such that v^'^i'dl'^) = inf^^jgd' vl'^{'&). The fol- 
lowing proposition describes the asymptotic behavior of as n — > oo. 
In order to get the central limit theorem, we need the following reinforced 
integr ability condition: 

(1.2) V^gR'^ E(/^(G)e-^-^) < +00. 

Proposition 1.2. Under (0.1) and (0.2), and vf^^{'d(^^) con- 

verge a.s. to "dt'^ and v^'^{'dl'^) as n— >oo. //, moreover, (1.2) holds, then 
V^(^?/'^ - ^l^^) ^ Md' {0,r), where 

r = [vV'^(,9{'^)]-i 

X CoviA*iA^l^^ - G)/2(G)e-^''''^-^+l^'''''^l'/2)[v2^/'^(^/-^)]-i 
and V^v-f^^{i^)=E{A*{Id + {A-d - G){A^ - G)*)^/2(G)e^^*^+l^''l'/2). 
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Remark 1.3. The Hessian matrix V^f ■^'"^(■i?) is positive definite under 
(0.1) and (0.2). If, moreover, (1.2) liolds, using the inequality IG^^I < e*^ + 
e"'^'' for all 1 < A; < d, one obtains that the covariance matrix Cov{{A'dl'^ — 
G)/2(G)e-^'^*'^-^+l^'^*'^l'/2) exists and T is well defined. 

To obtain some insights into the expression of this asymptotic covariance 
matrix, note that if (j){-d,x) = f'^{x)e~^'^'^~^^^^^ then by subtracting ^ x 

Er=iV^(/'(i9{'^,Gi) from both sides of the equation Vvf.'^^ii^l'^) = 
'Vv'^'^{'dl'^) and multiplying by ^/n, one obtains 




To prove the proposition, we use the following uniform strong law of large 
numbers, which is a restatement of [14], Lemma Al. This result is also 
a consequence of the strong law of large numbers in Banach spaces [10], 
Corollary 7.10, page 189. 

Lemma 1.4. Let (Xj)j>i be a sequence of i.i.d. W^-valued random vec- 
tors and hiM.'^ X -^M. be a measurable function. Assume that 

• a.s., ^ G M'^ h{9,Xi) is continuous; 
. VM > 0,E(sup|e|<A,/ \hi9,Xi)\) < +00. 

Then, a.s., G R'^ ^J2^^ih(9, Xi) converges locally uniformly to the con- 
tinuous function 9 £ R'^ h^E{h{9,Xi)). 

Proof of Proposition 1.2. Since, for M > 0, 

sup /2(G)e-^-G+l^lV2 < eMV2 ^2(^) -Q ^^MG^ ^ ^-MG"-^^ 
\9\<M k=l 

where the right-hand side is integrable by (0.2), applying Lemma 1.4 with 
{Xi)i>i = (Gj)j>i and h{9,x) = f'^(x)e~^'^^^^^^^'^ ensures that a.s., con- 
verges locally uniformly to . We restrict ourselves to a subset with proba- 
bility one of the original probability space on which this convergence holds. 
Let e > 0. By the strict convexity and the continuity of v^'^, 

a^lf inf z;/'^(^)-z;/'^(i?/'^)>0. 

■d:\-d--dl'^\>e 

The local uniform convergence of v^'"^ to v^'^ ensures that 

3n„ G N*,Vn > Vi? s.t. \^ - < e, - vf^^{^)\ < ^. 
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For n>na and '& such that ji? — i^l'^l > e, we deduce, using the convexity 
of vfi^^ for the first inequaUty, that 

Since vl^^{¥^^) < we conclude that 1??/'^ - < e for n > ii^- 

Therefore, "i?^'"^ converges a.s. to "dl'^. By combining this last result with 
the local uniform convergence of t;^'^ to the continuous function v-^'^, we 
deduce that vfi^^{'df^^) converges a.s. to v^'^{'dl'^). 

By (1.1) and (0.2), for M > 0, E(sup|e|<j^^ | Ve/2(G)e^^-^+l^l'/2|) < +oo. 

Similarly, E(sup|e|<M | V^/2(G)e-^-^+l^l'/2|) < +oo. The central limit the- 
orem governing the convergence of T?^'^ to follows from [14], Theo- 
rem A2. □ 



2. Strong law of large numbers and central limit theorem. Let 

0/-^ = A<-^ and 0f'^ = ^^{'^. 

The convergence of our estimator M„(^^''^,/) of E(/(G)) is ensured by the 
following theorem, which is a consequence of Propositions 2.6 and 2.13 be- 
low. As we do not take advantage of the definition of 0:['^ but only use its 
convergence properties obtained in the previous section, these propositions 
deal with the asymptotic properties of Mn{9l^^, g), where 51 : M'^ ^ M is an 
arbitrary function and 

E M-^ M„(0, 5) = - y g{G, + 0)e-^-^»-l^l'/2. 
n ^ 

To make the hypotheses on / precise in the case d' = 1 of a one-dimensional 
importance sampling parameter "i?, we introduce the following definition. 

Definition 2.1. For A G M"^, we say that a function /i:M'^ ^ M: 

• is A-nondecreasing (resp., A-nonincreasing) if 

Vx G M'^ G M 1-^ h{x + A'd) is nondecreasing (resp., nonincreasing) ; 

• is A-monotonic if it is either ^-nonincreasing or ^-nondecreasing; 
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• belongs to Vj\_ if h may be decomposed as the sum of two ^-monotonic 
functions gi and g2 such that 

(2.1) 3A>0,3/3g [0,2),VxGM \gi{x)\ < Xe^''^'' fori = 1,2. 

When d = 1, Vi consists of the functions of finite variation which sat- 
isfy the growth assumption (2.1). Let us also give an example in a higher 
dimension. 

Example. The time-discretization of the one-dimensional Black-Scholes 
model on the grid = tQ < ti < ■ ■ ■ < is given by (p{G), where fix) = 
^ ^r-a^l2 )tu+<^ Y.]^^x,^t, -t,^^^^^^^^ with a > 0. For the choice A = 
\Jt2 — ti, . . . , yjtd, — td-i)* , which corresponds to the Cameron-Martin for- 
mula for the underlying Brownian motion, each coordinate of the function 
(/3 is A-nondecreasing. Therefore, when either nondecreasing 

in each variable or nonincreasing in each variable, the function g o ip is A- 
monotonic. For gi{y) = {yd - K)+ and g2{y) = {yd - K)+l{^,i,,^y^>Ly, the 
functions g2° '■P and {gi — g2) o (p, which correspond to the down-and-out 
and the down-and-in barrier call options also belong to Va- More generally, 
all of the barrier call and put option payoffs belong to Va- 

Theorem 2.2. Assume (0.1), (0.2) and that f admits a decom,posi- 
tion f = fi + l{d'=i}/2i with fi a continuous function such that VM > 0, 
IE(sup|g|<jv/ |/i(G -|- 9)\) < +00 and f2 £ Va- Then, for any deterministic 
integer- valued sequence {i'n)n going to oo with n, Mn{dl'^, f) converges 
a.s. to E(/(G)). 

Note that for the integrability condition on /i to hold, it is enough there 
exist /? e [0,2), A > such that, for all x £ R'^, \fi{x)\ < Ae'^l''. 

Under stronger assumptions on /, the convergence of Mn{0^''^, f) to 
E(/(G)) is governed by a central limit theorem with optimal asymptotic 
variance 7;/'^(i9P) - E'^{f{G)). For a G (0, 1], let 

na = {g--R'^^Rs.t. 3pe [0,2),A>0,VxGM'^,|^(x)| < Ael^l" 

Vx, y e \g{x) - g{y)\ < Ael^l'^l^l^ - yH. 

Note that the assumptions of Theorem 2.2 are satisfied for / G Ha such that 
(0.1) holds and that the Holder condition in the definition of TCa implies the 
growth assumption for possibly larger constants A and (3. 

Theorem 2.3. Assume (0.1), (1.2) and that f admits a decomposition 
f = fi + f2 + l{(i'=i}/3> with fi a function such that 

VAf>0 e( sup |/i(G + ^)|+ sup |V/i(G + ^)|) <+oo. 
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/2 £ 'Ha with a E ( V*^ +^'^ — 'L^ 1] fiji(]^ g y^, r/ien, 

V^(M„(^i{'^,/) -E(/(G))) 4M(0,^;^'^(t?{'^) -E2(/(G))). 



Note that V +8*^ ~d increasing with d', equals ^ for d' = 1 and con- 
verges to 1 as ^ oo. Theorem 2.3 follows from Propositions 2.7 and 2.14 
below. With Proposition 1.2, one obtains the following corollary which en- 
ables us to construct confidence intervals for E(/(G)) with our algorithm. 

Corollary 2.4. Under the assumptions of Theorem 2.3, if\ai{f{G)) > 
0, then 

Remark 2.5. When Var(/(G)) is positive, the optimal variance v^'^((di''^) — 
¥?{f{G)) is also positive. The estimator vf^'^i'&l'^) - M^(6'j{'^, /) converges 
a.s. to this variance, but may take negative values for n small. 

Examples. The hypotheses of Theorems 2.2 and 2.3 are satisfied in the 
example given after Definition 2.1. Let us give other examples, still inspired 
by financial applications. 

• f{x) = (-ftT + X]fc=i '^fcc'^'''''*^^^''')'''! where the coefficients K, uj^ and are 
real numbers and M G M'^^'^: this class of functions belonging to Tii in- 
cludes the payoffs of call and put options written on baskets of underlyings 
in a multidimensional Black-Scholes framework or on a discretely sam- 
pled arithmetic average of a single Black-Scholes asset and the payoffs of 
exchange options on baskets. 

• fix) = (i<: + max^^^a;fce'^^(*^^)fe) + , f{x) = (i^T + minf^^ a;fce'"'=(*^^)'=) + : this 
class of functions belonging to 7ii includes the payoffs of best-of options. 

• When d=l, the functions of bounded variation f{x) = l{(^e'^^>A'} 
/(x) = l{a;e'^^<if} belong to Vi and correspond, respectively, to binary 
call and put options in the Black-Scholes model. 

• Let us consider the model 

dSt = St{a{t,St)dWt + rdt), So = s, 

where {Wt)t>o is a one-dimensional Brownian motion and the local volatil- 
ity function o": [0, T] x M — > M is bounded and such that x i— > xa{t,x) is 
Lipschitz continuous uniformly for t G [0, T]. When discretizing this SDE 
by the Euler scheme with d steps of length h = T/d on [0,T], one ap- 
proximates St by 99(G), where ip{x) = (t)d{xdAd~i{xd~i, ■ ■ ■ : (t'i{xi,s))) 
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with (j)k{u,v) = v{l + a{{k — l)h,v)\/hu + r/i), and G = -^{Wh,W2h — 
Wh,...,Wdh-W^a_,),,). 

There exists C > such that, for all k £ {1, . . . ,d}, 

\fu,v,u',v' €R \Mu,v)\ <C\v\{l + \u\), 

\Mu,v) - Mu,v')\ < Ciil + {\u\ V |n'|))|^; - v'\ 

+ i\v\V\v'\)\u-u'\). 

One deduces, by induction, that for x,y £ W^, \(p{x)\ < C"^|s|nfc=i(l + 
kfel) <C'^|s|e^l^l and 

d d 

Hx) - ipiy)\ < C^lsl ^ \xk - yk\ Y[{^ + V \yj\)) 

k=l j=l 

<C"^|s|\/de'^(l"Ms/l)|^_y|. 

Hence, the functions f{x) = ((^(x) — K)~^ and f{x) = {K — ip{x))'^ corre- 
sponding to the call and put payoffs in the discretized model belong to 
Hi. 

We are now going to study the convergence properties of Mn{9n^,g) in 
the multidimensional framework d' >1 before obtaining stronger results in 
the case = 1 of a one-dimensional importance sampling parameter. 

2.1. The general case. 

Proposition 2.6. Let (0n)n>i o- sequence of d- dimensional random 
vectors converging almost surely to some random vector O^o and g:W^ be 
a continuous function such that VM > 0, E(sup|g|<j^ \g{G + 9)\) < oo. Then, 
Mnidn,g) converges a.s. to E.{g{G)). 

Proof. We apply Lemma 1.4 with (Xj)j>i = (Gj)j>i and h{6,x) = 
g{x + 0)e~^'^^l^l The continuity assumption follows from the continu- 
ity of g. Concerning the integrability condition, we deduce from (0.3) and 
the inequality 

sup i\g{G + 0)\e-'-''-\'\'/^)< sup 1^(0 + 0)| TT (e^^«' + e-^^«') 
W<M \e\<M 

that 

e( sup {\giG + 0)|e^^-^"l^l'/2)) < e'^*^'/^ ^ j^^^ + ^ + ^)|) 

<2V^^'/2e( sup \g{G + e)\). 
\e\<ii+Vd)M 
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Therefore, a.s., 9 Mn{g,9) converges locally uniformly to the constant 
function 9 K{h{9, G)) = ¥.(g{G)). We easily conclude with the a.s. conver- 
gence of 9n to ^oo • n 

Proposition 2.7. Assume that g:R'^^Ris such that K{g'^ {G + 91''^) x 
g-26»f -G^ ^ g^^^ admits a decomposition g = gi + g2, with gi of class 
C^and satisfying 

(2.2) VM>0 e( sup |5i (61 + G)| + sup |V£?i (6* + <cx), 



and g2 € Ha for a G ( "^'^ '^^^ — ^ , 1]. T/ien, under (0.1) and (1-2), 
V^{MM'^,g)-E{giG))) ^ M(0, Var(5(G + e,^'^)e-^''^-«-l^*'-'l'/2)). 

By the central limit theorem, ^/E{Mn{9l''^,g)-E{g{G))) ^ Afi{0,\av{g{G + 

9l'^)e~^*' 1^/^)). As a consequence, it is enough to check that for 

i E {1,2}, ^{Mn{9l:^,gi) - Mn{9t^,gi)) ^ 0. The next lemma deals with 
the case i = 1 . 

Lemma 2.8. Let g:W^ — >M he a function satisfying (2.2). Then, 
under (0.1) and (1.2), ,/E{Mn{9f,'^,g) - Mn{el'^,g))^0. 

Since, for e > 0, 

¥{V^\Mni9i^^,g2) - mM'^,92)\ >e) 
<P(n^|i?J^'^-i?{'^| >1) 

+ P( sup V^|M„(Ai9,52)-M„(Ai9{'^,<72)|>e), 

|i9-)?f-*|<l/n'5 

choosing 5 G {d' /2a{d' + 2a), 1/2), which is possible since a > ^ — , 
the case i = 2 follows from the central limit theorem governing the con- 
vergence of i?^'"^ to 'ffl'^ (see Proposition 1.2) combined with the following 
result. 

Proposition 2.9. Letting A G M'^^'^' and g eTia for a£ (0, 1], 

V5 > -,Wo G M"'' 

2a{d' + 2a) 

sup ^/^iMniA-d^g) - Mr,{A'do,g)\ ^ 0. 

|i9-i?o|<l/n* 
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Remark 2.10. By the argument given in the case i = 2, if (7 G TCa for 
a > ^^^-^' , then 

^(M„(^/;;4,g) -E(g(G))) ^M(O,Var(5(G + 0/'^)e-^''-'-«-l^'-'l'/2)) 
for any deterministic integer-valued sequence (fn)n such that 3A > 0, 37 > 

Proof of Lemma 2.8. The function 6>-^Mn{-,g) is of class and 
it is easy to check that VgMn{0,g) = Mn{9,g) with g{x) = Vg{x) — g{x)x. 
The mean value theorem gives ^/n{Mn{9f^'^,g) — Mn{9l'^ , g)) = Ay/n{'dl;^ — 
H'^)-Mn{9f^^,g), with^^'^ e {9f^^,9l'^). Since, by Proposition 1.2, 

converges in law to a normal random variable, it is enough to prove that 
^n{9l^^,g) 0. The a.s. convergence of ^(^^ to "dl'^ implies the a.s. con- 
vergence of 9l^^ to 91'"^ . Since 



sup |(G + %(G + e)|< fx^(e«'+e-G')+M) sup \g{G + 9)l 
e\<M Vfe=i / |e|<M 



(2.2) combined with the reasoning used at the beginning of the proof of 
Proposition 2.6 yields 



(2.3) VM>0 E sup \{G + 9)g{G + 9)\ + sup \V g{G + 9)\) < +00. 

Proposition 2.6 then implies that Mn{9l^'^,g) ^E{g{G)). By (2.3) and the 
reasoning used at the beginning of the proof of Proposition 2.6, 

VM > e( sup \g{G + 9)\e~^-^^^^^^/^) < +00. 
\e\<M ' 

Hence, Lebesgue's theorem implies that \l Q¥.{g{G^9)e^^''^^\^\^ 1"^) = E(^(G-F 
^^g-e-G-l^l /2^_ Since the left-hand side is equal to 0, one deduces for = 
that E(5(G)) = 0. □ 

Remark 2.11. Let : M*^ ^ M be a function, g(x) =Vg{x) - g{x)x 
and g{x) =^ V'^g{x) — g{x)Id — xV*g{x) — Vg{x)x* + g{x)xx* . Assume that 

(2.4) E((|5(0f'^ + G)|2 + 15(0/'^ + G)|2)e-2ef ^.G) < 



VM>0 E{ sup \g{9 + G)\ 
\e\<Ai 

(2.5) 

+ sup \Vg{9 + G)\+ sup \V^g{9 + G)\) < 00. 
\e\<M \e\<M ' 
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Let ( 

^n)n be a deterministic integer-valued sequence such that 3A > 0, Vn G 
N*, f „ > X^/n. Then, using the decomposition 

+ ^^nieit - H'^r ( /V - t)Mn{t9i'' + (1 - t)el'^rg) dt) 

one obtains that under (0.1) and (1.2), the left-hand side converges in proba- 
bility toO. As a consequence, ^/ri{Mn{6lif',g)-E{g{G))) ^ Mi{0,YaT{g{G + 

dl'^)e~^*' 1^/^)). More generally, if g is of class and satisfies mo- 

ment assumptions like (2.4) and (2.5) involving its derivatives up to order 
k — 1 and k, respectively, this result is preserved if 3A > 0,Vn G N*,f„ > 

In order to prove Proposition 2.9, we need the following lemma. 

Lemma 2.12. If g eTCa for a £ {0,1], then 

VM >0,3C> 0, ye, e' G B{0, M),yn G N* 

E{{Mn{e,g)-Mrr{e',g)f) < ^ 

n 

Proof. Let M > 0. Since, by (0.3), 

E{g{G + e)e-'''-'''/' - g{G + 9')e~''''~'" = 0, 
it is enough to check that 
3C>0,ye,e' €B{0,M) 

K{{g{G + e)e-^-G-l^lV2 _ + ^>-^'-«-l^'lV2)2) < (j\e - 9f". 

One has 

EMG + 0)e-^-^-l^lV2 _ + e')e~''-^~\''\'/'f) 
< 2E{{g{G + 6)- g{G + 0'))2e-2e-G-|ep^ 

+ 2E{g\G + e'){e-'-''-\'\"/^ - e~''-G~\''\' l^f). 
Let A > and (3 G [0, 2) be such that 

(2.6) VxGM"' |5(x)| < Ael^l^ 

(2.7) Vx,yGM"' \g{x) - g{y)\ < \e\''\^''\y\^ \x - y\'^ . 
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One has 

(2.8) for c = 2(^^i)^,Va,6>0 (a + 6)'^ < c(a^ + 6^). 

Since, for eG^(0,M), | Vee-^-^-^'/2| ^ ^^Q^g^^-e-G-ey2^ < +M)e*'^|G|, 
one deduces that, for 6,6' G 5(0, M), 

E{{g{G + ^)e-^-^-l^l'/2 - g(G + ^')e-^'-«-l^'lV2)2) 

< 2X^e^^^''Ei{\6 - ° + |e - 6f{\G\ + M)2)e2^'^l«l+2'=IGI'') 
<C|(9-6''p". □ 

Proof of Proposition 2.9. Let e > 0. 

P( sup V^|M„(^^9,5)-M„(^i9o,5)l >e) 

< nP(|G| > v/2dlogri) 

+ P( sup V^\Mn{A^,g)-Mn{A^o,g)\>e, 

max < \/2dlogn) . 

l<i<n ' 

Since 

P(|G| > v/2dlogn) 

d 

(2.9) <^P(|G^1>y2l^) 

fc=i 

= 2dP(Gi > v/2T^) < ^^e-(V^)V2^ 

V2 logn 

the second term of the right-hand side tends to as n goes to infinity. Now, 
let us focus on the first term. 

Let M=|i?o| + l and M = \A\M. For G 5(0, M), using (2.7), (2.8) 
and (2.6) for the second inequahty, one obtains 

\Mn{A'd',g)-M^{A^,g)\ 

< i ^ \g{G, + A^') - g{G, + ^^)|e-^'''-^-l-^'^'l 

n ^ 

1=1 

1 " 

^ A|.An^^ - ^^,(|c^|/H+M^)+M|Gd(i + (2M)i-"(M + |G,|)). 

2 = 1 
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Hence, when maxi<j<„|Gi| < \/2dlogn, there exists a constant 7 not de- 
pending on n such that if =^ , 

Vt?',??G5(0,M) 

(2.10) 

\Mn{A^',g) - Mn{A^,g)\ < - ,?'|-e^('°sn)^_ 
We can cover S(^?o, ^) with = c [(7i/"ni/(2")-<5e(7/«){iogn)-/^i/a)d'^ ^^^Is 
of radius ( 2^e'^(iogn)'^^ )'''^"; where C is a geometrical constant not depending 
on n. For A: G {1, . . . ,K}, let B/^ denote the fcth bah and i?^ its center. By 
(2.10), when maxi<j<„|Gi| < ^/2dlogn, 

yke{l,...,K} sup \Mn{A^,g)-Mn{A^k,9)\<:^. 
Using Lemma 2.12 for the fourth inequality, one deduces that 



'( sup y/n\Mn{A'd,g) -M„(^??o,9)l > e, max \Gi\ < V2(ilogi 



> ^ - sup |M„(^i9,5) - Mn{A^k,9)\, 



max I G j I < \/2dlogn ) 

L<j<n / 



1< _ 

e 



<¥{max\MniA^k,9) - Mn{A^0,9)\ > ^ r- 
\k<K 2\/n 

< ^ ^E{{Mn{A^k,9) - Mn{A^o,g)f) 



k<K 



(2.11) < y l!!^]lZL^^ <C'n'^7(2a)-(d'+2a)5gd'7/a(logn)''_ 
k<K 

Since /? < 2 and 5 > 2a{d''+2a) ' < ^ ^'^'^ ^ ~ + ^")^ < 0- Therefore, the 
upper bound in equation (2.11) converges to as n increases to infinity. □ 

2.2. The case of a one- dimensional importance sampling parameter. In 
the present section, dedicated to the case d' = 1 of a one-dimensional im- 
portance samphng parameter, we obtain convergence results under weaker 
assumptions on the function g. 

Proposition 2.13. Let AeR''-. Assume that g-.R'^^R admits a de- 
composition g = gi + g2 with gi a continuous function such that \/M > 0, 
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E(sup|g|<ji/ \gi{G + ^)|) < +c>o and g2 € V_a. Then, for any sequence {'&n)n 
of real-valued random variables converging a.s. to some deterministic limit 
"di, G M, Mn{A'dn-,g) converges a.s. to K{g{G)). 

Proof. By Proposition 2.6, it is enough to deal with the situation 
where g = gi + gi, with g-^ (resp., gi) being an ^-nondecreasing (resp., A- 
nonincreasing) function satisfying (2.1). One has g = g^l^g^yQj + g-^l^g^^Qj + 
5il{3i>0} +5'il{gj^<0}. where the functions g^l{g^>o} and -g^l^g^^Q^ (resp., 
5(|l|gj^>o} and —g-^l^g_^^Qj) are nonnegative, ^-nondecreasing (resp., ^-non- 
increasing) and satisfy (2.1). As a consequence, it is enough to deal with 
the case where g is nonnegative, ^-monotonic and satisfies (2.1). Choosing 
t!)' >i!) when g is „4-nondecreasing and '&>'&' when g is ^-nonincreasing, 
one has, for all x S R*^, 



gix + ^^Oe-^'''""'^'''' - + ^^)e--^''-^-l-^''l /2 

(2.12) > {g{x + A&){e-^'''-'-\^'''\'/^ - g--^'^— l>''lV2)) 

V {g{x + ^^?)(e--^^'-^-l-^'''l'/2 _ e-^^--|^-?lV2)). 

From now on, we suppose that g is nonnegative, ^-nondecreasing and sat- 
isfies (2.1): a symmetric argument applies to the nonincreasing case. Let 
e > 0, r/ G (0, 1] . For m£W, 

P(3n > m, \E{g{G)) - M„(^i9„,5)| > e) 

< P(^3n > m, \E{g{G)) - Mn{A^,,g)\ > 
+ P(3n>m,|i9„-T9^| > 77) 
+ P^Vn > m, I'dn — ^ V and 

3n>m,\Mn{A-d^,g)-Mn{A'dn,g)\>^ 

By the strong law of large numbers and the a.s. convergence of to i?*, the 
first two terms on the right-hand side both converge to as m — > +00. Let 
us check that the third one also converges to 0. Let M = liJ^I + l, K > 0. For 
—M M, one has, using (2.12) for the first inequality, then (2.1) 

and (2.8), that 

Mn{A^',g)-Mn{A^,g) 



> - 

n 



1 

-Y,i\g{G,+A&)\A\g{G,+Am 



i=l 
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1=1 



1 = 1 

where 7x = A|^|e^(^^+(*-^l-^l)'')(M|^| +i^)e^-^^l-^l and C = 2Ae^(*^l-^l)^ When 

— < rj, choosing i? = i?„ and t?' = + then i) = i}^, — r] and t?' = 
one deduces that M„(^'(?^,(7) — Mn^Ai^n, g) is bounded from below and 
above, respectively, by 

Mn{A^^, g) - Mn{A{d^ + r]),g)-jK{^^ + r]- 

1=1 

and 

Mn{Ai^^,g) - Mn{A{'&^ - v).g)+lK{K + 11- 
" i=\ 

Choosing K such that E(e'=l'^'l''+*^l-^ll'^'ll||G',|>i^}) < ^ and then r? such 
that 27a'?7 < |, we deduce that 

Vn > m, - 19*1 < and 3n > m, |M„(^'i?v,, /i) - M„(^'!9„, ^)l > | 



+ P( 3n > m, M„(^i9*, 5) - M„(^(i9* + r?), 9) < -| 



By the strong law of large numbers Mn(^i9*,(/) — Mn(^('!9* + and 
M„(i9*,g)-M„(^(i9*-r/),5) both converge a.s. to and ^ ELi e'^''^''''"^*^'-^"'^'' x 
l{|Gi|>i<:} to some limit not greater than One concludes that each term 
on the right-hand side converges to as m ^ 00. □ 

Proposition 2.14. Assume that g:R'^ ^Ris such that E{g'^ {G + Ol'"^) x 



) < +00 and admits a decomposition g = gi + g2 + l{d'=i}5'3) with 



ROBUST ADAPTIVE IMPORTANCE SAMPLING 19 

gi of class satisfying (2.2), 52 G "^a for a G ( ^'^ +sd j-d_ ^ ^ ^ 
Then, under (0.1) and (1.2), 

As in Proposition 2.7, this statement is proved by combining the usual cen- 
tral limit theorem governing the convergence of y/n{Mn{Ol''^ , g) — ¥.{g{G))), 
Lemma 2.8, Proposition 2.9, the decomposition of functions in Va given at 
the beginning of the proof of Proposition 2.13 and the next result. 

Proposition 2.15. Let A^W^ and g-.W'- be an A-monotonic func- 
tion with constant sign satisfying (2.1), 

V(5> l/4,Vi?o gK, sup ^/^\Mn{A'^,g)-M^{A'^Q,g)\^{). 

Remark 2.16. Assume that d' = 1. Let g G Va, and {i'n)n be a deter- 
ministic integer- valued sequence such that 

3A>0,37> i,VnGN*, i/„ > An^. 

Combining Propositions 1.2 and 2.15, one obtains that under (0.1) and (1.2), 

^{MMl^.g)-ng{G))) ^M^{Q,\nT{g{G + el'^)e-<>'*'^-G-\sl'^\V2))_ 

Proof of Proposition 2.15. Up to a multiplication by -1, we may 
assume that g is nonnegative. Moreover, we only deal with the case where g 
is ^-nondecreasing, the nonincreasing case being obtained by a symmetric 
argument. By (2.12), for ^' < ^" and e {&','&"], 

M„{A^',g) - iy |5(G, +^i9)||e-^''-^-l-^''l'/2 -e-^'''-«-l^'^'l'/2| 
n ^ 

<AUA^,g) 
<Mn{A'd\h) 

n ^ 

With (2.1) and (2.8), one deduces that if -M <'&' < ^" < M, then 

sup \MniAi^,g) - Mn{A'&o,g)\ 
i9e[i9',-i?"] 

<max{\Mn{A^',g)-Mn{A^0,9)\,\Mn{Ar,g)-Mn{A^0,g)\) 

(2.13) +^KjlIlye'\G.\^+^'\^\\^^\{M\A\ + |G,|). 

1=1 
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Let V = and M = |t?o| + 1- When maxi<j<„ < i/2dlogn, the second 
term on the right-hand side is smaller than ^e"^^'"^"^" (J}" — "!?'), where the con- 
stant 7 does not depend on n. Let e > 0. We set K = \2'^n^/'^~^e^^^°^^'>'' /e] 
and 1?^ = i?o + 'te/27e'^('°g")'' for k e {-K,. . . ,K]. Applying (2.13) with 
i9' = "dk and = 'dk+ii one obtains that when maxi<j<n, |Gi| < \/2(ilogn, 

sup \Mn{A'd,g)-Mn{A^o,9)\ 
i9e[i?fc,i?fc+i] 

< + max( I M„ {A-dk ,9)-Mn {A-do ,g)\, 



\MniAi^k+i,9) - MniAi^o,g)\). 



Therefore, 



e 



sup \MniA'd,g) - Mn{A'do,g)\ > 
i9e[i?o±i/n''] 



<P max \Gi\ > V Za log n 

Vl<i<n 



-FP( max \Gi\ < y/2dlogn, 

, l<i<n 

max \Mn{A^k,9)- Mn {A-do ,g)\> ^ 



\k\<K 2^ 

By (2.9), the first term on the right-hand side tends to as n — > oo. Rea- 
soning as we did at the end of the proof of Proposition 2.9, with the next 
lemma replacing Lemma 2.12, we conclude that the second term also tends 
to 0. □ 

Lemma 2.17. When A^"^^ and g:W^ is a A-monotonic function 
with constant sign satisfying (2.1), we have 



VM > 0, 3C7 > 0, W,i}' £ [-M, M] , Vn G N* 
E{{Mn{A^,g) - Mn{A^',g)f) < 



2, ^ C\^-^'\ 



n 



Proof. Choosing > i9 if is nonnegative and ^-nondecreasing, or 
nonpositive and ^-nonincreasing, and ■&>{}' otherwise, one has 

E((5(G + ^^)e--^''-^-l^''l'/2 - g{G + A{}')e-^'''-''-\^'''\"'^f) 
= E(5'(G)e--^''-^+l-^''l'/2)+E(52(^)g->^'.G+|>^f/2) 

- 2-K{9{G)g{G + A{^' - ^))e-^^'-G+A^-^^'-\A&f /^^ 
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The conclusion is then a consequence of the following inequality: for 9, 9' £ 
R"^ with \e\ V \0'\ < \A\M, 

E(52(G)(e-^-^+l^l'/2 + e-e'-G+|e'|V2 _ ^^-e'.G+e.e'-\e'\^/2^^ 

+ 2e-^'-«-l^'l'/2|el^'l^-e^-^'|)) 

<c(\9-e'\ [\mt)\V2 f |^(t)_x|e-l^+2''WI'/^dxdi 
V Jo Jw^ 

+ 2el^'lV2|el'''P-e^-^'| / e^l^+^^'lV^ 
<C7|0-^'|. □ 

3. Practical implementation and applications. Option pricing in local 
or stochastic volatility models eventually boils down to the computation 
of an expectation E(/(G)), where G is a d-dimensional standard normal 
random vector. In a financial context, there is no restriction in assuming 
that the payoff function / satisfies both (0.1) and (0.2). In most cases, this 
expectation will be computed using Monte Carlo simulations because closed 
formulas are barely available. The question of reducing the variance arises 
quite naturally in this context. Relying on equation (0.3), we have chosen 
the importance sampling point of view to tackle the delicate problem of 
variance reduction. Practitioners' desires with variance reduction is to have 
an automatic toolbox at hand, which is precisely what we are devising here. 
As explained in the Introduction, we advise to compute the minimizer ■i?^'"^ 
of vfi^"^ and then to use this value in a Monte Carlo procedure, as described 
in Algorithm 1. Note that the same samples are used to compute i?^'"^ and 
the Monte Carlo estimator MniA'df^'^ , /). Even though the terms involved in 
MI(Ai!}1^^, f) are not independent, according to Corollary 2.4, it is as easy 
to construct confidence intervals, as for a crude Monte Carlo computation. 

Remark 3.1. In the name ("Reduced Robust Importance Sampling") of 
Algorithm 1, the term "Reduced" emphasizes that the optimal importance 
sampling parameter is searched for in a subspace of the set of all parameters. 
When the matrix A = Id, the algorithm is simply denoted RIS because there 
is no longer any dimension reduction. 

In this section, we first explain how "i?/^'"^ can be computed using New- 
ton's optimization procedure. We then illustrate the efficiency of this robust 



22 



B. JOURDAIN AND J. LELONG 



Algorithm 1 Reduced Robust Importance Sampling (RRIS) 

1. Generate Gi, . . . , Gn, n i.i.d. samples following the law of G. 

2. Compute the minimizer i?^'"^ of 

Tl . 

1=1 

3. Compute the expectation E(/(G)) by Monte Carlo 

1 " 

n ^ 

1=1 



variance reduction technique, both in the multidimensional Black-Scholes 
framework and in more general local volatility frameworks. 

3.1. Solving the minimization problem. We already know, from Proposi- 
tion 1.1, that the function vf^'^ is strongly convex and infinitely continuously 
differentiable. Hence, we can approximate i?^'"^ using Newton's algorithm, 
for instance. The Hessian matrix V|?;^''^(i?) can be written as the sum of 
a scalar matrix and a positive semidefinite matrix. Hence, it is quite ob- 
vious that the smallest eigenvalue of V|u^'^(i?) is larger than the smallest 
eigenvalue of A* A times i J2i=i f{Gi)e-'^^-^'+\^^\'/^. This last term can be 
arbitrarily small, depending on the function /. Therefore a straightforward 
application of Newton's algorithm can be particularly inefficient in some 
cases. It would be much better to have an alternative representation of 'dl^^ 
as the minimizer of a function, the smallest eigenvalue of whose Hessian 
matrix does not depend on /. We advise to rewrite V^v(^^((&) as 



1 " 

1=1 

1 " 



n 

'L = L 

Hence, t?,^'"^ can be seen as the root of 

V - A* i,? _ TJl=iA*Gip{Gi)e-'^~^-^^ 

with uf^'^i-d) = ^ + log(Er=i f{Gi)e-^^-^^). The Hessian matrix of uJ^'^ 
is given by 

\72,J,A(M) - A*A^ T,i=i'^*GiG*Ap{Gi)e- 
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Using the Cauchy-Schwarz inequality, it is clear that V|u^''^(i?) — ^4* A is a 
positive semidefinite matrix. Hence, the smallest eigenvalue of V|u^'"^(??) is 
always larger than the smallest one of A* A, whatever the values taken by / 
are. This advocates the use of u^'"^, rather than v(^^, to compute 

Using this new expression, we implement Algorithm 2 to construct an 
approximation of 'd(^^. Since u^^^ is strongly convex, for any fixed n, 
converges to 'd^^^ when k goes to infinity. The direction of descent at step 
k should be computed as the solution of a linear system. There is no point 
in computing the inverse of V|ti^''^(3;^), which would be computationally 
much more expensive. 

Remarks on the implementation. From a practical point of view, e 
should be chosen reasonably small, e ~ 10~^. This algorithm converges very 
quickly and, in most cases, less than five iterations are enough to get a very 
accurate estimate of "i?^'"^, actually within the e-error. Since the points at 
which the payoff function / is evaluated remain constant through the iter- 
ations of Newton's algorithm, the values /^(Gj) for i = 1, . . . ,n should be 
computed before starting the optimization algorithm, something which con- 
siderably speeds up the whole process. The Hessian matrix of our problem 
is easily tractable, so there is no point in using quasi-Newton's methods. 

3.2. Numerical examples. In this subsection, we present numerical re- 
sults obtained by combining Algorithms 1 and 2 for different pricing prob- 
lems. For each example, we have computed the reference price using a 
crude Monte Carlo estimator with a huge number of samples, such that 
the width of the 95% confidence interval is 10""^. In the columns "price 
MC," "price RIS" and "price RRIS" of the tables, we give, respectively, the 
crude Monte Carlo estimator, the RIS estimator and the RRIS estimator 
of the price computed with the same, smaller, number n of samples. The 
variances for the RRIS algorithm (resp., the RIS algorithm) given in the 



Algorithm 2 Newton's algorithm 

Choose an initial value x^ G M'^. 
k = l 

while \V^ul^^{x'^)\ > e do 

1. Compute such that (V|n('^(x^))d^ = -V^n^'^(x^). 

2. xi+^=xi + di,k = k+l. 
end while 
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tables below are computed along a single run of the algorithm using the es- 
timator vl'^{i)i^^) - M2(0/.^, /) [resp, vHel) - M2(0/, /)] which converges 
almost surely to {Ol) — ¥?{f{G)) under the assumptions of Theorem 2.2. 
The variances of the crude Monte Carlo methods (denoted 'Var MC in the 
tables) are estimated by \ Y.'l=iP{Gi) - ELi /(C*))^- Ah of the his- 
tograms presented hereafter are centered around their empirical means and 
renormalized by the empirical variances. When no further indications are 
given, the matrix A is chosen as the identity, which implies that d = d! and 

^ n • 

3.2.1. Black-Scholes framework. First, we consider an /-dimensional Black- 
Scholes model in which the dynamics under the risk-neutral measure of each 
asset is supposed to be given by 

dSi = si (r dt + a' dWl), 5o = (So\ . . . , ^o') , 

where W = (VF^, . . . , W^^. Each component is a standard Brownian mo- 
tion. For the numerical experiments, the covariance structure of W will be 
assumed to be given by lyV^.,W^)t = ptl^i^jj +tl^i^jj. We suppose that 
p e (—7^,1), which ensures that the matrix C = + ^{i=j})i<i,j<i 

is positive definite. Let L denote the lower-triangular matrix involved in 
the Cholesky decomposition C = LL* . To simulate W on the time-grid 
< ti < ^2 < • ■ • < ^Afj we need d = I x N independent standard normal 
variables and set 



(y/tlL ... \ 

y/tlL yjt2 - tiL ... 



\/tN-l — tN-2L 



where G is a normal random vector in M.^^^ . The vector (cj^, . . . , a'^) is the 
vector of volatilities and r > is the instantaneous interest rate. We will 
denote the maturity time by T. 

Basket option. We consider options with payoffs of the form (X]f=i uj^Sj. — 
where {u!^ , . . . , lo'^) is a vector of algebraic weights. The strike value 
K can be taken to be negative, to deal with put-like options. All of these 
payoffs belong to TCi, so Theorem 2.3 applies, as Figures 1 and 2 illustrate. 
These histograms have been obtained with 5000 independent runs of the 
RIS algorithm. The case of such basket options is definitely a crucial issue 
because there is no closed formula as soon as d> 2, and the variance of a 
crude Monte Carlo approach can be dramatically large. We can see in the 
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examples of the basket options treated in Table 1 that the Robust Impor- 
tance Sampling method does reduce the variance by at least 10. The results 
are obtained within 4.5 CPU seconds, compared to the 1.5 CPU seconds 
needed for the crude Monte Carlo computation. The same number of sam- 
ples are used in both methods, which brings an overall gain of 3.3 in favor of 
the RIS algorithm. In the case p = 0.2 and K = 50, which is the option used 
for the histograms, the empirical variance is 1.76, whereas the on-line esti- 
mated variance is 1.74. This illustrates the conclusion of Corollary 2.4. The 
improvement brought by the RIS algorithm is very encouraging, not only 
because it definitely reduces the variance, but, above all, because it is fully 
automatic. Unlike most adaptive importance sampling strategies developed 
so far and, in particular, the ones based on stochastic approximations, the 
one we propose here does not require any parameter tuning. 




Fig. 1. Limiting distribution of 9^ for the option of Table 1 with p = 0.2 and K — 50. 



0.4 



0.3 



0.2 



0.1 



" -4 -3 -2 -1 1 2 3 4 

Fig. 2. Limiting distribution of AIn{9i, f) (RIS) for the option of Table 1 with p — Q.2 
and if = 50. 
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We have also tested our algorithm on a 10-dimensional exchange option 
with randomly chosen spots and volatilities. The numerical results of Table 2 
show that the RIS algorithm performs well for a wide variety of basket 
options. In any case, the variance is divided by at least 7, whereas it increases 
twice the CPU time. This leads to an overall gain of 3.5 in the worst case. 

One- dimensional digital option. We consider an option with payoff 1| St>L} ? 
where L > 0. We choose T = 1, Sq = 100, a = 0.2, r = 0.05 and L = 140. We 
fix the number of samples at 100,000. A crude Monte Carlo computation 
gives a price of 0.05952 with a variance of 0.053, whereas the exact price is 
0.05968. On each run of the algorithm, we can compute the on-line estimator 
of the variance and use it to construct a confidence interval. We have run 
the RIS algorithm 100,000 times independently and, on each run, we have 
constructed the confidence interval of level 95% using the on-line estimated 
variance vl^^{dli^) — M^(0^'"^, /). The true price falls outside the confidence 
interval in 5104 cases out of 100,000, which gives a level of 94.9%. This little 
experiment illustrates how Corollary 2.4 can be used to construct confidence 
intervals. 



Table 1 

Basket option in dimension d = 40 with r — 0.05, T =1, Sq = 50, (t* — 0.2, uj'' — ^ for all 

i = 1, . . . ,d and n — 10,000 



P K 


Price 


Price MC 


Variance MC Price RIS 


Variance RIS 


0.1 45 


7.210 


7.216 


12.12 


7.209 


1.04 


55 


0.561 


0.567 


1.90 


0.559 


0.14 


0.2 50 


3.298 


3.304 


13.56 


3.296 


1.74 


0.5 45 


7.662 


7.678 


42.2 


7.650 


5.06 


55 


1.906 


1.879 


14.46 


1.906 


1.25 


0.9 45 


8.215 


8.154 


69.47 


8.211 


7.89 


55 


2.823 


2.823 


30.08 


2.819 


2.58 








Table 2 






Basket option in dimension rf = 10 


withr^Q.Q^, T=l, K = 


= 0, p = 0.2. 


The spots are 


chosen uniformly in 


70,130] and the volatilities in [0.1,0.3]. 




i = l,...,d/2 




and 


Lo' = -\ fori 


— d/2 + 1, . . . , d and n — 


100,000 




Price 




Variance MC 




Variance RIS 


3.58 






21.66 




2.97 


0.129 






0.511 




0.016 


7.4 






34.04 




5.02 


1.08 






5.24 




0.52 
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One dimensional barrier option. This time, we only focus on one asset and 
we want to price a call option with a discrete barrier on this asset. A discrete 
barrier means that we only check if the asset has crossed the barrier at fixed 
dates ti, . . . , trf = r, usually one per month. We assume that the grid defined 
by ti, . . . ,t(i is regular with step size 6t = T/d. The payoff can be written 
as {St — -f^^)+l{vi<i<(i,St >i^} ^ down-and-out call option with barrier L. 
The price of such an option can be written as E(/(G^, . . . , C^)) with 

X 1 

\ {r-<T2/2)t,+<T^y^' 

{Vl<j<d,Soe ' ^3=1 ^>L} 

In this particular case, if we consider the RIS algorithm developed before, 
the importance sampling parameter 9 lies in M'^. Hence, the optimization 
problem becomes harder to solve as the number of time steps increases. 

One idea is to restrict the parameter 6 to the subspace {AiD:^ £ M}, 
where the vector A is defined by ^ = {\/ti, . . . , — td-i)* ■ In this case, 
the optimal parameter is always real- valued d' = 1, whatever the number of 
time steps we consider. This alternative approach — referred to as RRIS (Re- 
duced Robust Importance Sampling) — corresponds to adding a linear drift 
to the Brownian motion. These two approaches are compared in Table 3 for 
the case of a down-and-out call option and it turns out that the optimal 
variances obtained in both cases are very close to each other. When the 
underlying asset is of dimension one, the computation time gained by using 
the RRIS algorithm instead of the RIS one is not that important, but it will 
become a crucial issue for multidimensional barrier options. The efficiency 
of the two algorithms on the down-and-out call option is very impressive. 
As in the previous example, the variance is reduced by a factor between 8 
and 11. The use of the RRIS algorithm compared to a crude Monte Carlo 
method doubles the computation time, which means that the gain is at least 
4. Figures 3 and 4 illustrate the asymptotic behavior of the RIS algorithm. 
They have been obtained by running the RIS algorithm 5000 times inde- 
pendently. The histogram of Figure 3 represents the limiting distribution 
of the first component of ft^ computed with the RIS algorithm and fits the 
density of the standard normal distribution (plain line) rather well, which 
illustrates Proposition 1.2. Although the hypotheses of Theorems 2.2 and 
2.3 are not satisfied for the payoff at hand in the RIS framework, Figure 4 
shows that our estimator is still convergent and asymptotically normal. This 
numerical convergence is emphasized by the matching of the empirical vari- 
ance of the histogram and the on-line variance computed on a single run of 
the RIS algorithm; for these two quantities, we find, respectively, 34.70 and 
35.68. Since the payoff belongs to Va, the convergence and the asymptotic 
normality of the RRIS estimator are, in return, ensured by Theorems 2.2 
and 2.3. 
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Table 3 

Down-and-out call option with a — 0.2, r = 0.05, T = 2, Sq = 100, K = 110 and 

71 = 10,000 



L 


Price 


Price MC 


Variance MC 


Variance RIS 


Price RRIS 


Variance RRIS 


70 


11.445 


11.472 


401.51 


34.10 


11.454 


34.33 


80 


11.244 


11.240 


401.04 


35.68 


11.261 


36.11 


90 


9.689 


9.672 


383.93 


42.54 


9.705 


45.37 


95 


7.564 


7.518 


342.05 


42.01 


7.557 


49.84 



Barrier basket option. We consider basket options in dimension I with a 
discrete barrier on each asset. For instance, if we consider a down-and-out 
caU option, the payoff can be written as (J2i=i ^^Sj. — K)^l^y^^jyj^^^gi ^j^iy , 




Fig. 3. Limiting distribution of the first component of (RIS) for the option of Table 3 
with L = 80. 




-4-3-2-10 1 2 3 4 



Fig. 4. Limiting distribution of M„{6l^, f) (RIS) for the option of Table 3 with L = 80. 
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where oo = (u;^, . . . , uo^) is a vector of positive weights, L = (L^, . . . , L^) is the 
vector of barriers, K >Q the strike value and tj^ = T. Once again, we con- 
sider one time step per month, which means that for an option with maturity 
time T = 2 as in Table 4, the number of time steps is = 24. From now on, 
we fix / = 5. Hence, in the RIS algorithm, the parameter 6 is of dimension 
d = 120. Even though this is not that huge, it requires much more computa- 
tional time, as the numerical experiments show. For the option of Table 4, a 
standard Monte Carlo computation takes 4.3 CPU seconds and the RRIS al- 
gorithm takes 8.7 CPU seconds, whereas the RIS algorithm needs 22.5 CPU 
seconds. The RIS algorithm is three times slower than the RRIS algorithm, 
in which the parameter 6 lies in the subspace {A-d : 'd € M'*} of dimension 
d' = I = 5 with j4Q_;^)/_,_j j = y/tj — tj-i (convention to = 0) for j = 1, . . . , 
and i = 1, . . . , I, all the other coefficients of A being zero. 

Path-dependent basket options are a prime example of pricing problems 
in which the use of one importance sampling parameter per time step dra- 
matically slows down the computation. Restricting the importance sampling 
parameter space to a subspace of dimension = / = 5, as in the RRIS algo- 
rithm, divides the computational time by 3, whereas the optimal variance of 
the RRIS algorithm is very close to that of the RIS algorithm. Hence, there 
is no point in using one importance sampling parameter per time step. The 
improvement factor in terms of variance provided by the RRIS algorithm 
varies between 10 and 20. Because the RRIS algorithm is twice as slow as a 
standard Monte Carlo computation, the overall gain factor varies between 
5 and 10. 

The payoff does not satisfy the assumptions of Theorems 2.2 and 2.3, 
neither in the RIS nor in the RRIS framework. Nevertheless, it seems rather 
clear from Figure 6 that the RRIS estimator is convergent and asymptot- 
ically normal. Besides, for K = 50, the variance computed on a single run 
of the RRIS algorithm perfectly matches the empirical variance of the his- 
togram. Figure 5 illustrates the asymptotic normality of 9^'^ which is still 
ensures by Proposition 1.2 in this example. These histograms have been 
constructed from 100,000 independent runs of the RRIS algorithm. 









Table 4 








Down-and-out 


call option in 


dimension I 


— 5 with a — 


0.2, So = (50,40,60,30,20), 


L = 


= (40,30,45,20,10), p = 0.3, r 


= 0.05, T = 


2, cj = (0.2, 0.2, 0.2, 0.2, 0.2) and 


n= 100,000 


K 


Price 


Price MC 


Var MC 


Var RIS 


Price RRIS 


Var RRIS 


45 


2.371 


2.348 


22.46 


2.58 


2.378 


2.62 


50 


1.175 


1.178 


10.97 


0.78 


1.179 


0.79 


55 


0.515 


0.513 


4.72 


0.19 


0.517 


0.19 
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0.1 



\ 

\ 



Fig. 5. Limiting distribution of the first component of j?^"* (RRIS) for the option of 
Table 4 with K = 50. 

3.2.2. Dupire's framework. We consider an /-dimensional local volatility 
model in which the dynamics under the risk-neutral measure of each asset 
5* is supposed to be given by 

dSl = Slirdt + a{t, SI) dWi), 5o = (5o\ . . . , S^), 

where W = {W^, . . . , W^) is defined and generated as in the Black-Scholes 
framework. The local volatility function a we have chosen is of the form 

(3.1) a{t, x) = 0.6(1.2 - e-0-i*e-0-00i(--"-^)')e-0-05v^, 

with s > 0. We know that there exists a duality between the variables (t,x) 
and {T,K) in Dupire's framework. Hence, for the formula (3.1) to make 
sense, one should choose s equal to the spot price of the underlying asset so 
that the bottom of the smile is located at the forward money. We refer to 
Figure 7 for an overview of the smile. 



0.4 



0.3 



0.1 



\ 



-4-3-2-10 I 2 3 4 

Fig. 6. Limiting distribution of Mn{9n^ , f) (RRIS) for the option of Table 4 with 
K = 50. 
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Fig. 7. Local volatility function. 

Best-of option. We consider options with payoffs (maxi<j</a;j5^ — K)^, 
where K >0 and {u>^ , . . . ,lo^) is a vector of positive weights. The payoffs 
belong to Tii. To discretize the dynamics, we use an Euler scheme with N = 
100 time steps per year. The results of Table 5 are encouraging. The RRIS 
algorithm with A defined as in the barrier basket option case reduces the 
variance by 6, whereas it only increases the computational time by 2, which 
leads to a gain of 6/2. We do not present any results for the RIS algorithm 
because the extra computational time it requires makes it noncompetitive. 

Conclusion. We propose a fully automatic adaptive importance sampling 
technique for the computation of E(/(G)), where / iM"^ — > M and G is a stan- 
dard d-dimensional normal random vector. For a large class of functions /, 
including many financial payoffs, we prove that our estimator is conver- 
gent and asymptotically normal with optimal limiting variance. Note that 
all of the convergence results stated in Theorems 2.2, 2.3, Corollary 2.4, 
Propositions 2.7, 2.14, Lemma 2.8 and Remarks 2.10, 2.11, 2.16 still hold if 



Table 5 

Best-of option in dimension 12 with p — 0.5, r — 0.05, T — 1, n = 50,000 and uj^ = 1, 

Sq — 50 for all i — 1, ... ,1 



K 


Price 


Price MC 


Var MC 


Price RRIS 


Var RRIS 


70 


3.260 


3.236 


137 


3.299 


24.50 


80 


1.901 


1.917 


94.23 


1.905 


14.09 


90 


1.220 


1.253 


67.70 


1.227 


9.41 
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Mn{e,g) is defined as ^ ELi + 0)e-^-^^-l^l'/2 for any sequence (G'i)i>i 



of i.i.d. d-dimensional standard normal random vectors and, in particular, 
when this sequence is independent from the one, (Gj)j>i, used to compute 
{'&n^)n>i- Our numerical experiments confirm the effectiveness of our esti- 
mator: in comparison with the crude Monte Carlo method, the computation 
time needed to achieve a given precision is divided by a factor between 3 and 
15. Moreover, they suggest that the convergence and asymptotic normality 
of the estimator still hold under weaker assumptions on the function /. In 
view of these numerical results and the definition of Vi, it would be natural 
to investigate the class of functions / such that, for some constants A > 



Unfortunately, we have thus far not been able to derive the asymptotic 
properties of our estimator for such functions. In this work, we have fo- 
cused on importance sampling. A natural extension would be to investigate 
the coupling with stratification techniques in the spirit of [8]. In particular, 
it would be interesting to combine the present importance sampling algo- 
rithm with the adaptive stratified sampling methods recently proposed in [6] 
(adaptive optimization of the proportions of random drawings made in the 
different strata) and [5] (adaptive optimization of the stratification direction 
e G M*^ for a standard normal random vector when the strata are given by 
{x eW'-:e- X £ [yi-i,yi)} with -oo = < 2/i < 2/2 < • • • < y/ = +oo). 
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